|
In computing, stop words are words which are filtered out before or after processing of natural language data (text). Though stop words usually refer to the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search. Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as ''the'', ''is'', ''at'', ''which'', and ''on''. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as "The Who", "The The", or "Take That". Other search engines remove some of the most common words—including lexical words, such as "want"—from a query in order to improve performance.〔(Stackoverflow ): "One of our major performance optimizations for the "related questions" query is removing the top 10,000 most common English dictionary words (as determined by Google search) before submitting the query to the SQL Server 2008 full text engine. It’s shocking how little is left of most posts once you remove the top 10k English dictionary words. This helps limit and narrow the returned results, which makes the query dramatically faster".〕 Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept. == See also == * Text mining * Concept mining * Information extraction * Natural language processing * Query expansion * Stemming * Search engine indexing * Poison words * Function words 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「stop words」の詳細全文を読む スポンサード リンク
|